AITopics | zimmert and seldin

Collaborating Authors

zimmert and seldin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits

Neural Information Processing SystemsApr-30-2026, 02:09:01 GMT

In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that achieve nearly optimal performance for both stochastic and adversarial environments. For this purpose, we show that a natural approach referred to as exploration by optimization [Lattimore and Szepesvári, 2020b] works well.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Add feedback

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Neural Information Processing SystemsApr-25-2026, 13:15:29 GMT

This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruption. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ABest-of-both-worldsAlgorithmforBanditswith DelayedFeedbackwithRobustnesstoExcessiveDelays

Neural Information Processing SystemsFeb-18-2026, 20:14:51 GMT

Joulani et al. (2013) have studied multi-armed bandits with delayed feedback under the assumption that the rewards are stochastic and the delays are sampled from a fixed distribution.

data mining, justification, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Denmark (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

abb9d15b3293a96a3ea116867b2b16d5-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 14:24:38 GMT

algorithm, stochastic world, transition, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Last Iterate Analyses of FTRL in Stochasitc Bandits

Zhan, Jingxin, Han, Yuze, Zhang, Zhihua

arXiv.org Artificial IntelligenceOct-28-2025

The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function $Ψ(p)=-4\sum_{i=1}^{d}\sqrt{p_i}$ associated with the BOBW FTRL algorithm $1/2$-Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.22819

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)

Add feedback

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

Neural Information Processing SystemsOct-10-2025, 22:39:03 GMT

Delayed feedback is an ubiquitous challenge in real-world applications.

algorithm, log 2, zimmert and seldin, (13 more...)

Neural Information Processing Systems

Country: Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

Neural Information Processing SystemsAug-16-2025, 17:02:52 GMT

When the losses are stochastically generated, [Simchowitz and Jamieson, 2019, Y ang et al., 2021]

artificial intelligence, machine learning, transition, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds

Ito, Shinji, Tsuchiya, Taira, Honda, Junya

arXiv.org Machine LearningMar-10-2024

Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio and propose update rules for learning rate that achieves an upper bound within a constant factor of this lower bound. Specifically, we illustrate that the optimal competitive ratio is characterized by the (approximate) monotonicity of components of the penalty term, showing that a constant competitive ratio is achievable if the components of the penalty term form a monotonically non-increasing sequence, and derive a tight competitive ratio when penalty terms are $\xi$-approximately monotone non-increasing. Our proposed update rule, referred to as \textit{stability-penalty matching}, also facilitates constructing the Best-Of-Both-Worlds (BOBW) algorithms for stochastic and adversarial environments. In these environments our result contributes to achieve tighter regret bound and broaden the applicability of algorithms for various settings such as multi-armed bandits, graph bandits, linear bandits, and contextual bandits.

algorithm, bandit, inequality follow, (13 more...)

arXiv.org Machine Learning

2403.00715

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

Yi, Jialin, Vojnović, Milan

arXiv.org Machine LearningOct-21-2023

Coordinating multiple agents that can communicate with each other to make decisions under uncertainty is a classical problem and has many different applications in computer science (Lynch, 1996), game theory (Chakravarty et al., 2014) and machine learning (Lanctot et al., 2017). We consider the multi-agent version of a multi-armed bandit problem which is one of the most fundamental decision making problems under uncertainty. In this problem, a learning agent needs to consider the exploration-exploitation trade-off, i.e. balancing the exploration of various actions in order to learn how much rewarding they are and selecting high-rewarding actions. In the multi-agent version of this problem, multiple agents collaborate with each other trying to maximize their individual cumulative rewards, and the challenge is to design efficient cooperative algorithms under communication constraints. We consider the nonstochastic (adversarial) multi-armed bandit problem in a cooperative multi-agent setting, with K 2 arms and N 1 agents.

artificial intelligence, big data, data mining, (17 more...)

arXiv.org Machine Learning

doi: 10.5555/3545946.3598780

2211.17154

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Filters

Collaborating Authors

zimmert and seldin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

ABest-of-both-worldsAlgorithmforBanditswith DelayedFeedbackwithRobustnesstoExcessiveDelays

e262fc23ec7275230ee77c55d0cc9555-Paper-Conference.pdf

abb9d15b3293a96a3ea116867b2b16d5-Paper.pdf

Last Iterate Analyses of FTRL in Stochasitc Bandits

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits